Skip to content

[PyTorch] Skip test_nvfp4_partial_cast_matches_full test when NVFP4 is not available.#2735

Open
ksivaman wants to merge 1 commit intoNVIDIA:mainfrom
ksivaman:fix_h100_ci
Open

[PyTorch] Skip test_nvfp4_partial_cast_matches_full test when NVFP4 is not available.#2735
ksivaman wants to merge 1 commit intoNVIDIA:mainfrom
ksivaman:fix_h100_ci

Conversation

@ksivaman
Copy link
Member

@ksivaman ksivaman commented Mar 4, 2026

Description

#2691 introduced this test but the NVFP4 condition wasn't checked properly.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Skip test_nvfp4_partial_cast_matches_full test when NVFP4 is not available.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
@ksivaman ksivaman requested a review from KshitijLakhani March 4, 2026 19:32
@ksivaman
Copy link
Member Author

ksivaman commented Mar 4, 2026

/te-ci pytorch L1

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR fixes a misplaced pytest.skip() call introduced in #2691. The NVFP4 availability guard was placed inside _test_nvfp4_partial_cast_matches_full, a worker function that runs inside a torch.distributed.run subprocess — completely outside the pytest process. Calling pytest.skip() there raised a BaseException that was uncaught, causing the subprocess to exit non-zero and subprocess.run(..., check=True) to throw subprocess.CalledProcessError, making the test fail on machines without NVFP4 instead of skipping it.

The fix moves the guard to test_nvfp4_partial_cast_matches_full, which runs directly under pytest, so pytest.skip() is handled correctly and the subprocess is never launched on unsupported hardware.

  • The root cause is correctly identified: pytest.skip() must only be called from code running under the pytest process.
  • The worker function _test_nvfp4_partial_cast_matches_full is only reachable via the subprocess launched by the now-guarded test function, so removing the check from the worker is safe.
  • No other code paths call _test_nvfp4_partial_cast_matches_full directly.
  • Change is minimal and targeted with no side-effects on other tests.

Confidence Score: 5/5

  • This PR is safe to merge; it is a small, well-scoped bug fix with no risk of regressions.
  • The change is a single-file, four-line relocation of a pytest.skip() call from a subprocess worker to the correct pytest-context test function. The logic is sound, the fix is the minimal correct solution, and there are no new dependencies or side-effects introduced.
  • No files require special attention.

Important Files Changed

Filename Overview
tests/pytorch/distributed/test_cast_master_weights_to_fp8.py Moves is_nvfp4_available guard from subprocess worker _test_nvfp4_partial_cast_matches_full to the pytest-facing test_nvfp4_partial_cast_matches_full, correctly preventing a spurious subprocess.CalledProcessError when NVFP4 is unavailable.

Sequence Diagram

sequenceDiagram
    participant pytest as pytest runner
    participant T as test_nvfp4_partial_cast_matches_full
    participant SP as subprocess (torch.distributed.run)
    participant W as _test_nvfp4_partial_cast_matches_full (worker)

    pytest->>T: invoke test

    alt NVFP4 not available (NEW behaviour)
        T->>T: is_nvfp4_available() → False
        T->>pytest: pytest.skip(reason)
        Note over SP,W: subprocess never launched
    else NVFP4 available
        T->>SP: subprocess.run(..., check=True)
        SP->>W: run_parallel_nvfp4_partial_cast_test()
        W->>W: distributed quantisation assertions
        W-->>SP: success / AssertionError
        SP-->>T: exit 0 / non-zero
        T-->>pytest: pass / CalledProcessError
    end
Loading

Last reviewed commit: c1fb5d7

Copy link
Collaborator

@KshitijLakhani KshitijLakhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !
Good to merge once CI passes fully !
Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants